374 research outputs found

    CADISHI: Fast parallel calculation of particle-pair distance histograms on CPUs and GPUs

    Full text link
    We report on the design, implementation, optimization, and performance of the CADISHI software package, which calculates histograms of pair-distances of ensembles of particles on CPUs and GPUs. These histograms represent 2-point spatial correlation functions and are routinely calculated from simulations of soft and condensed matter, where they are referred to as radial distribution functions, and in the analysis of the spatial distributions of galaxies and galaxy clusters. Although conceptually simple, the calculation of radial distribution functions via distance binning requires the evaluation of O(N2)\mathcal{O}(N^2) particle-pair distances where NN is the number of particles under consideration. CADISHI provides fast parallel implementations of the distance histogram algorithm for the CPU and the GPU, written in templated C++ and CUDA. Orthorhombic and general triclinic periodic boxes are supported, in addition to the non-periodic case. The CPU kernels feature cache-blocking, vectorization and thread-parallelization to obtain high performance. The GPU kernels are tuned to exploit the memory and processor features of current GPUs, demonstrating histogramming rates of up to a factor 40 higher than on a high-end multi-core CPU. To enable high-throughput analyses of molecular dynamics trajectories, the compute kernels are driven by the Python-based CADISHI engine. It implements a producer-consumer data processing pattern and thereby enables the complete utilization of all the CPU and GPU resources available on a specific computer, independent of special libraries such as MPI, covering commodity systems up to high-end HPC nodes. Data input and output are performed efficiently via HDF5. (...) The CADISHI software is freely available under the MIT license.Comment: 19 page

    A massively parallel semi-Lagrangian solver for the six-dimensional Vlasov-Poisson equation

    Full text link
    This paper presents an optimized and scalable semi-Lagrangian solver for the Vlasov-Poisson system in six-dimensional phase space. Grid-based solvers of the Vlasov equation are known to give accurate results. At the same time, these solvers are challenged by the curse of dimensionality resulting in very high memory requirements, and moreover, requiring highly efficient parallelization schemes. In this paper, we consider the 6d Vlasov-Poisson problem discretized by a split-step semi-Lagrangian scheme, using successive 1d interpolations on 1d stripes of the 6d domain. Two parallelization paradigms are compared, a remapping scheme and a classical domain decomposition approach applied to the full 6d problem. From numerical experiments, the latter approach is found to be superior in the massively parallel case in various respects. We address the challenge of artificial time step restrictions due to the decomposition of the domain by introducing a blocked one-sided communication scheme for the purely electrostatic case and a rotating mesh for the case with a constant magnetic field. In addition, we propose a pipelining scheme that enables to hide the costs for the halo communication between neighbor processes efficiently behind useful computation. Parallel scalability on up to 65k processes is demonstrated for benchmark problems on a supercomputer

    Complexity Bounds for Block-IPs

    Get PDF
    We consider integer programs (IPs) with a certain block structure, called two-stage stochastic. A two-stage stochastic IP is an integer program of the form min{cTxAx=b,xu,xZs+nt}\min\{c^Tx \mid Ax=b,\, \ell\leq x\leq u,\, x\in \mathbb{Z}^{s + nt}\} where the constraint matrix AZrn×s+tnA\in \mathbb{Z}^{rn \times s+tn} consists of blocks A(i)Zr×sA^{(i)} \in \mathbb{Z}^{r\times s} on a vertical line and blocks B(i)Zr×tB^{(i)}\in \mathbb{Z}^{r\times t} on the diagonal line aside. We improve the bound for the Graver complexity of two-stage stochastic IPs. Our bound of 3O(ss(2rA+1)rs)3^{O(s^s(2r||A||_\infty+1)^{rs})} reduces the dependency from rs2rs^2 to rsrs and is asymptotically tight under the exponential time hypothesis in the case that r=1r=1. The improved Graver complexity bound stems from improved bounds on the intersection for a class of structurally rich integer cones. Our bound of 3O(dΔ)d3^{O(d\Delta)^d} for dimension dd and absolute entries bounded by Δ\Delta is independent of the number of intersected integer cones. We investigate special properties of this class, which is complemented by the fact that these properties do not hold for general integer cones. Moreover, we give structural characterizations of this class that admit their use for two-stage stochastic IPs

    Single-hole transistor in p-type GaAs/AlGaAs heterostructures

    Full text link
    A single-hole transistor is patterned in a p-type, C-doped GaAs/AlGaAs heterostructure by AFM oxidation lithography. Clear Coulomb blockade resonances have been observed at T=300 mK. A charging energy of ~ 1.5 meV is extracted from Coulomb diamond measurements, in agreement with the lithographic dimensions of the dot. The absence of excited states in Coulomb diamond measurements, as well as the temperature dependence of Coulomb peak heights indicate that the dot is in the multi-level transport regime. Fluctuations in peak spacings larger than the estimated mean single-particle level spacing are observed.Comment: 4 pages, 5 figure
    corecore